110 research outputs found
Linear discriminant analysis for the small sample size problem: an overview
Dimensionality reduction is an important aspect in the pattern classification literature, and linear discriminant analysis (LDA) is one of the most widely studied dimensionality reduction technique. The application of variants of LDA technique for solving small sample size (SSS) problem can be found in many research areas e.g. face recognition, bioinformatics, text recognition, etc. The improvement of the performance of variants of LDA technique has great potential in various fields of research. In this paper, we present an overview of these methods. We covered the type, characteristics and taxonomy of these methods which can overcome SSS problem. We have also highlighted some important datasets and software/packages
A deterministic approach to regularized linear discriminant analysis
The regularized linear discriminant analysis (RLDA) technique is one of the popular methods for dimensionality reduction used for small sample size problems. In this technique, regularization parameter is conventionally computed using a cross-validation procedure. In this paper, we propose a deterministic way of computing the regularization parameter in RLDA for small sample size problem. The computational cost of the proposed deterministic RLDA is significantly less than the cross-validation based RLDA technique. The deterministic RLDA technique is also compared with other popular techniques on a number of datasets and favorable results are obtained
Rotational linear discriminant analysis using Bayes Rule for dimensionality reduction
Linear discriminant analysis (LDA) finds an orientation that projects high dimensional feature vectors to reduced dimensional feature space in such a way that the overlapping between the classes in this feature space is minimum. This overlapping is usually finite and produces finite classification error which is further minimized by
rotational LDA technique. This rotational LDA technique rotates the classes individually in the original feature
space in a manner that enables further reduction of error. In this paper we present an extension of the rotational
LDA technique by utilizing Bayes decision theory for class separation which improves the classification performance even further
Design and implementation of fuzzy based control system for natural gas pipes system based on LabVIEW
The quality of Natural Gas Piping Systems (NGPS) must be ensured against any manufacturing defects. For this purpose, we develop a special testing machine (STM) constructed at the lab to test (NGPS). The proposed (STM) function is based on testing the weak points at the pipe connections e.g. pipe bends, and intermediate connections. For more than 1500 pieces of (NGPS), crack propagation simultaneously followed up and monitored on the output screen at the critical positions of the pipelines connections. The control system utilizes the LabVIEW tools for various signals acquisition and monitoring also for designing the control system strategy
A filter based feature selection algorithm using null space of covariance matrix for DNA microarray gene expression data
We propose a new filter based feature selection algorithm for classification based on DNA microarray gene expression data. It utilizes null space of covariance matrix for feature selection. The algorithm can perform bulk reduction of features (genes) while maintaining the quality information in the reduced subset of features for discriminative purpose. Thus, it can be used as a pre-processing step for other feature selection algorithms. The algorithm does not assume statistical independency among the features. The algorithm shows promising classification accuracy when compared with other existing techniques on several DNA microarray gene expression datasets
MoRFPred-plus: Computational Identification of MoRFs in Protein Sequence using physicochemical properties and HMM profiles
Intrinsically Disordered Proteins (IDPs) lack stable tertiary structure and they actively participate in performing various biological functions. These IDPs expose short binding regions called Molecular Recognition Features (MoRFs) that permit interaction with structured protein regions. Upon interaction they undergo a disorder-to-order transition as a result of which their functionality arises. Predicting these MoRFs in disordered protein sequences is a challenging task.
In this study, we present MoRFpred-plus, an improved predictor over our previous proposed predictor to identify MoRFs in disordered protein sequences. Two separate independent propensity scores are computed via incorporating physicochemical properties and HMM profiles, these scores are combined to predict final MoRF propensity score for a given residue. The first score reflects the characteristics of a query residue to be part of MoRF region based on the composition and similarity of assumed MoRF and flank regions. The second score reflects the characteristics of a query residue to be part of MoRF region based on the properties of flanks associated around the given residue in the query protein sequence. The propensity scores are processed and common averaging is applied to generate the final prediction score of MoRFpred-plus.
Performance of the proposed predictor is compared with available MoRF predictors, MoRFchibi, MoRFpred, and ANCHOR. Using previously collected training and test sets used to evaluate the mentioned predictors, the proposed predictor outperforms these predictors and generates lower false positive rate. In addition, MoRFpred-plus is a downloadable predictor, which makes it useful as it can be used as input to other computational tools
Detecting TCP SYN Flood Attack in the Cloud
In this paper, an approach to protecting virtual machines (VMs) against TCP SYN flood attack in a cloud environment is proposed. An open source cloud platform Eucalyptus is deployed and experimentation is carried out on this setup. We investigate attacks emanating from one VM to another in a multi-tenancy cloud environment. Various scenarios of the attack are executed on a webserver VM. To detect such attacks from a cloud provider’s perspective, a security mechanism involving a packet sniffer, feature extraction process, a classifier and an alerting component is proposed and implemented. We experiment with k-nearest neighbor and artificial neural network for classification of the attack. The dataset obtained from the attacks on the webserver VM is passed through the classifiers. The artificial neural network produced a F1 score of 1 with the test cases implying a 100% detection accuracy of the malicious attack traffic from legitimate traffic. The proposed security mechanism shows promising results in detecting TCP SYN flood attack behaviors in the cloud
Predicting MoRFs in protein sequences using HMM profiles
Background: Intrinsically Disordered Proteins (IDPs) lack an ordered three-dimensional structure and are enriched in
various biological processes. The Molecular Recognition Features (MoRFs) are functional regions within IDPs that
undergo a disorder-to-order transition on binding to a partner protein. Identifying MoRFs in IDPs using computational
methods is a challenging task.
Methods: In this study, we introduce hidden Markov model (HMM) profiles to accurately identify the location of
MoRFs in disordered protein sequences. Using windowing technique, HMM profiles are utilised to extract features from
protein sequences and support vector machines (SVM) are used to calculate a propensity score for each residue. Two
different SVM kernels with high noise tolerance are evaluated with a varying window size and the scores of the SVM
models are combined to generate the final propensity score to predict MoRF residues. The SVM models are designed
to extract maximal information between MoRF residues, its neighboring regions (Flanks) and the remainder of the
sequence (Others).
Results: To evaluate the proposed method, its performance was compared to that of other MoRF predictors;
MoRFpred and ANCHOR. The results show that the proposed method outperforms these two predictors.
Conclusions: Using HMM profile as a source of feature extraction, the proposed method indicates improvement in
predicting MoRFs in disordered protein sequence
Improving protein fold recognition using the amalgamation of evolutionary-based and structural-based information
Deciphering three dimensional structure of a protein sequence is a challenging task in biological science. Protein
fold recognition and protein secondary structure prediction are transitional steps in identifying the three
dimensional structure of a protein. For protein fold recognition, evolutionary-based information of amino acid
sequences from the position specific scoring matrix (PSSM) has been recently applied with improved results. On
the other hand, the SPINE-X predictor has been developed and applied for protein secondary structure prediction.
Several reported methods for protein fold recognition have only limited accuracy. In this paper, we have
developed a strategy of combining evolutionary-based information (from PSSM) and predicted secondary structure
using SPINE-X to improve protein fold recognition. The strategy is based on finding the probabilities of amino acid
pairs (AAP). The proposed method has been tested on several protein benchmark datasets and an improvement of
8.9% recognition accuracy has been achieved. We have achieved, for the first time over 90% and 75% prediction
accuracies for sequence similarity values below 40% and 25%, respectively. We also obtain 90.6% and 77.0%
prediction accuracies, respectively, for the Extended Ding and Dubchak and Taguchi and Gromiha benchmark
protein fold recognition datasets widely used for in the literature
Application of cepstrum analysis and linear predictive coding for motor imaginary task classification
In this paper, classification of electroencephalography (EEG) signals of motor imaginary tasks is studied using cepstrum analysis and linear predictive coding (LPC). The Brain-Computer Interface (BCI) competition III dataset IVa containing motor imaginary tasks for right hand and foot of five subjects are used. The data was preprocessed by applying whitening and then filtering the signal followed by feature extraction. A random forest classifier is then trained using the cepstrum and LPC features to classify the motor imaginary tasks. The resulting classification accuracy is found to be over 90%. This research shows that concatenating appropriate different types of features such as cepstrum and LPC features hold some promise for the classification of motor imaginary tasks, which can be helpful in the BCI context
- …